D1154-D1158 Nucleic Acids Research, 2014, Vol. 42, Database issue 
doi:10.1093/nar/gktll57 



Published online 21 November 2013 



CAMP: Collection of sequences and structures 
of antimicrobial peptides 

Faiza Hanif Waghu, Lijin Gopi, Ram Shankar Barai, Pranay Ramteke, Bilal Nizami and 
Susan Idicula-Thomas* 

Biomedical Informatics Centre of ICMR, National Institute for Research in Reproductive Health, Mumbai 
400012, Maharashtra, India 



Received September 14, 2013; Revised and Accepted October 25, 2013 



ABSTRACT 

Antimicrobial peptides (AMPs) are gaining import- 
ance as anti-infective agents. Here we describe the 
updated Collection of Antimicrobial Peptide (CAMP) 
database, available online at http://www.camp. 
bicnirrh.res.in/. The 3D structures of peptides are 
known to influence antimicrobial activity. Although 
there exists databases of AMPs, information on 
structures of AMPs is limited in these databases. 
CAMP is manually curated and currently holds 6756 
sequences and 682 3D structures of AMPs. Sequence 
and structure analysis tools have been incorporated 
to enhance the usefulness of the database. 



INTRODUCTION 

Antimicrobial peptides (AMPs) are widely studied as 
potential alternatives for antibiotics. Surge in research 
on AMPs has led to the development of several data- 
bases and prediction tools. Some of these are general data- 
bases such as APD2 (1), DAMPD (2) and LAMP (3), 
whereas others are specialized databases like — AMSdb 
(http://www.bbcm.units.it/~tossi/pagl.htm) that contains 
AMPs from only plant and animal sources; RAPD (4) 
provides information on recombinant methods to 
generate AMPs; PhytAMP (5) and BACTIBASE (6) are 
databases dedicated to AMPs from plant and bacterial 
sources, respectively; Defensins knowledgebase (7) and 
PenBase (8) are devoted to AMPs from defensin 
and penaeidin families, respectively; Peptaibol Database 
(9) is a database of peptaibols (unusual class of 
peptides); BAGEL (10) is a database of bacteriocins; 
and HIPdb (1 1) is a database of experimentally validated 
HIV-inhibiting peptides. The enormous amount of data 
on AMPs had motivated us to develop a general 
database, Collection of Antimicrobial Peptides (CAMP) 
(12), which included a sequence-based prediction tool 
for AMPs. 



While all these databases provide comprehensive informa- 
tion on sequences of AMPs, information on structures of 
AMPs is limited. The topological features of peptides play a 
crucial role in dictating antimicrobial activity (13). Although 
many sequence-based prediction algorithms are available, 
the knowledge of 3D structural features of known AMPs 
has not been exploited to develop prediction algorithms. The 
lack of structural databases of AMPs is probably one of the 
main impediments in this direction. Presently, there are 
several AMPs whose structural information is available in 
the Protein Data Bank (PDB) (14). However, retrieving in- 
formation on structures of AMPs from the structural data- 
bases such as PDB is not a trivial task; for example, the 
structures may have additional chains that are non-AMPs, 
and these have to be filtered out by manual curation. The 
structures may also not be easily retrieved from structure 
databases based on simple keyword searches such as 'anti- 
bacterial', 'antifungal', etc. To address these shortcomings, 
the current release of CAMP has been developed. 

MATERIALS AND METHODS 

Data collection and organization 

Sequence and structural information of AMPs was 
retrieved from protein databases of NCBI, UniProtKB 
(15) and PDB using combination of keywords like 'anti- 
microbial', 'antibacterial', 'antifungal', 'antiviral' and 
'antiparasitic'. Manually curated information related to 
sequence, structure, protein definition, accession 
numbers, reference literature, activity, taxonomy of the 
source organism, target organisms with minimum 
inhibitory concentration (MIC) values, hemolytic 
activity of the peptide, functional and structural classifi- 
cations, protein family descriptions and links to external 
databases like UniProtKB, PDB, PubMed and other 
AMP databases is made available to the users. 

Database architecture 

The updated CAMP database is built on Apache HTTP 
server 2.0.59. MySQL Server 5.0 is used at the back-end, 
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whereas the front-end is built using PHP, HTML, 
JavaScript, Perl and Open Flash Chart 2. 

Below is a brief description of the user interface of 
CAMP: 

(1) Home: The CAMP database along with its various 
features is described in this section. 

(2) Databases: Data are sectioned into sequence, struc- 
ture and patent databases. 

(3) Tools: The following analysis tools are available to 
the users. 

(a) AMP prediction: Users can predict AMPs and/or 
scan for antimicrobial regions within the peptides 
using Support Vector Machine (SVM), Random 
Forests (RF) and Artificial Neural Network 
(ANN). 

(b) Feature calculator: Amino acid composition, 
secondary structural propensities and 
physicochemical properties such as net charge, 
hydrophobicity, etc of the peptides can be 
calculated. 

(c) BLAST: Users can use BLAST (16) tool against 
the sequence or structure database of CAMP to 
find homologous sequences or structures, 
respectively. 

(d) ClustalW: Multiple sequence alignment of the 
peptides can be obtained using ClustalW (17) 
tool from EMBL-EBI. 

(e) Vector Alignment Search Tool: Similar protein 
structures can be identified using this NCBI 
tool (18). 

(f) PRATT: This tool from ExPASy can be used to 
find patterns in a set of related AMPs (19,20). 

(g) Helical wheel: Alpha-helical AMPs can be 
studied using the helical wheel Java applet 
created by Edward K. O'Neil and Charles 
M. Grisham (University of Virginia in 
Charlottesville, Virginia). 

(h) PDB2PQR: This clone server can be used for con- 
verting PDB files into PQR file format, (PQR files 
are PDB files where B-factor and occupancy 
columns have been replaced by radius and per- 
atom charge, respectively) which could be used 
for further structural studies (21,22). 

(4) Search: Users can search for sequences and/or struc- 
tures of AMPs using basic and advanced search 
options. 

(5) Links to other available AMP databases have been 
provided. 

(6) Statistics: Coverage of the database based on the 
nature of data, taxonomy of source organism and 
activity has been depicted using pie charts and 
Venn diagram. 

(7) Help: A detailed explanation about the features and 
tools available in the database has been provided in 
this section. 

Prediction algorithm 

Dataset creation 

The positive dataset constituted of 3010 AMP sequences. 
These were obtained from the patent and experimentally 



validated datasets of CAMP, after removing sequences 
that (i) are redundant (100% similarity cut-off), (ii) 
have non-standard amino acids and (hi) have length 
> 100. CD-HIT server was used for removing redundant 
sequences (23). 

The negative dataset consists of 4011 sequences, 
generated in our previous work (12). It includes experi- 
mentally proven non-antimicrobial sequences, arbitrary 
sequences generated using random numbers and protein 
sequences retrieved from the UniProt database without 
annotation as 'antimicrobial'. The sequences had length 
approximately in the same range as the positive dataset. 
The CD-HIT program (23) was used to eliminate se- 
quences with >90% identity. These datasets were 
randomly divided into training (70%) and test (30%) 
datasets. 

Model generation 

Sixty-four best peptide descriptors based on the RF Gini 
score were used for developing SVM-, RF- and ANN- 
based prediction models. All the models were evaluated 
using Matthews correlation coefficient (MCC), prediction 
accuracy and 10-fold cross-validation accuracy on training 
and test datasets. For developing the prediction models, 
implementation of SVM, RF and ANN in R (version 
2.15.3) was used (24). 

SVM 

Kernlab package in R was used to train the SVM classifier 
(25). In this study, we have used polynomial kernel 
function. The values of the hyper parameters were set as 
follows: degree = 4, scale = 0.01 and offset = 1. 

RF 

'randomForest' package was used to train the RF classi- 
fier with a maximum of 1500 trees (26). 

ANN 

'nnet' package in R was used for building the ANN-based 
prediction model (27). 



RESULTS AND DISCUSSION 

The updated CAMP is a comprehensive database on se- 
quences and structures of AMPs. It currently holds 6756 
sequences of AMPs (experimentally validated (2602), pre- 
dicted (2438) and patents (1716)), which include 2736 
recently identified AMP sequences. The information on 
the sequence, AMP family, source, target organism and 
activity is captured in the database. As can be seen in 
Figure 1A-C, CAMP has a wide coverage on the above 
fields. 

CAMP presently contains 682 AMP structures. 
Multiple structures of AMPs, if available in PDB, are 
also integrated in the database. Although structural infor- 
mation on AMPs is available in databases such as APD2, 
LAMP, etc, the structures can be directly viewed using 
Jmol viewer in CAMP. Direct viewing of structures is 
also available in Defensins knowledgebase, PhytAMP, 
HIPdb and BACTIBASE. However, these databases 
cater to specific class of AMPs. 
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Figure 1. (A) Pie chart of AMP families in CAMP, (B) Pie chart of source organisms of AMPs in CAMP, (C) Venn diagram of classification of 
AMP activity in CAMP and (D) Relative amino acid composition of experimentally validated and predicted sequences of AMPs in CAMP as 
compared with Swiss-Prot composition. 



Another interesting feature of the current release of 
CAMP is that users can selectively retrieve information 
on specific families of AMPs of their interest; e.g. 
cathelicidins, defensins and cecropins. The AMP fam- 
ily information for the peptides has been annotated 
manually using information from Pfam (28), 
InterPro (29) and associated literature. The distribution 
of the AMP families in the database can be seen in 
Figure 1A. 

The prediction algorithm for AMPs has been modified 
using the updated sequence information. Supplementary 
Table SI shows the prediction accuracy, MCC and cross- 
validation accuracy of the prediction models. Users can 
predict the antimicrobial activity of proteins and/or scan 
regions (with user-defined lengths) within proteins for 
antimicrobial activity. 



Tools that aid in sequence and structure analysis such as 
feature calculator, PRATT, ClustalW, Vector Alignment 
Search Tool, BLAST and PDB2PQR have also been 
incorporated in CAMP. Effect of mutations on the struc- 
ture of AMPs and/or their analogs can be visualized using 
the Jmol visualizer integrated in the database. Helicity is 
known to influence antimicrobial activity (30) and there- 
fore, tool for helical wheel projection is also available. 
AMPs are known to be rich in hydrophobic and cationic 
amino acids. The ratio of the percentage frequency of 
amino acids in CAMP to the percentage frequency of 
amino acids in UniProtKB/Swiss-Prot protein 
knowledgebase (Release 2013_08 of 24 July 2013) is 
plotted in Figure ID. As expected, AMPs were observed 
to be enriched in positively charged and hydrophobic 
residues such as Arg, Lys, Gly, Cys, Trp and Val residues. 
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CONCLUSIONS 

CAMP holds a massive update on AMP sequences and 
incorporates several tools relevant to design of AMPs. The 
3D conformations of peptides are known to be critical 
determinants of antimicrobial activity. The prominent 
feature of the current release of CAMP is the addition 
of experimentally derived structures of AMPs, which can 
be directly viewed using the Jmol viewer. The update also 
facilitates family-based study on AMPs. A detailed com- 
parison of CAMP with the existing databases on AMPs is 
presented in Table 1 . The information, present in an easily 
searchable and downloadable form, is envisaged to accel- 
erate sequence-structure-activity studies on AMPs. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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